13 research outputs found

    Modeling Documents with Deep Boltzmann Machines

    Full text link
    We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Improving neural networks by preventing co-adaptation of feature detectors

    Full text link
    When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition

    Simultaneous Localization and Surveying with Multiple Agents

    No full text
    Abstract. We apply a constrained Hidden Markov Model architecture to the problem of simultaneous localization and surveying from sensor logs of mobile agents navigating in unknown environments. We show the solution of this problem for the case of one robot and extend our model to the more interesting case of multiple agents, that interact with each other through proximity sensors. Since exact learning in this case becomes exponentially expensive, we develop an approximate method for inference using loopy belief propagation and apply it to the localization and surveying problem with multiple interacting robots. In support of our analysis, we report experimental results showing that with the same amount of data, approximate learning with the interaction signals outperforms exact learning ignoring interactions.

    Practical Large-Scale Optimization for Max-Norm Regularization

    Get PDF
    The max-norm was proposed as a convex matrix regularizer in [1] and was shown to be empirically superior to the trace-norm for collaborative filtering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro [2] to devise scalable first-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas.

    Skip-thought Vectors

    No full text
    We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets. The end result is an off-the-shelf encoder that can produce highly generic sentence representations that are robust and perform well in practice. We will make our encoder publicly available.Natural Sciences and Engineering Research Council of CanadaSamsung (Firm)Canadian Institute for Advanced ResearchUnited States. Office of Naval Research (Grant N00014-14-1-0232
    corecore